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(54) Voice recognition based user interface for wireless devices 



(57) The present invention relates to a wireless 
communication system (1 00) that utilizes a remote voice 
recognition server system (109) to translate voice input 
received from serviced mobile devices (102, 103) into a 
symbolic data file (e.g. alpha-numeric or control charac- 
ters) that can be processed by the mobile devices. Ini- 
tially, a voice communication channel (126) is estab- 
lished between the serviced mobile device and the voice 
recognition server. A user of the mobile device then be- 



gins speaking in a fashion that may be detected by the 
voice recognition server system. Upon detecting the us- 
er's speech, the voice recognition server system uses a 
voice recognition application to translate the speech into 
a symbolic data file, which is then fonA/arded to the user 
through a separate data communication channel (128, 
130). The user, upon receiving the symbolic data file at 
the mobile device, reviews and edits the content and fur- 
ther utilizes the file as desired. 
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Description 




number of characterless than two minutes (assum- 
ing the user could ty^Rh an average degree of skill). 
Inputting the same number of keystrokes on the keypad 
of a mobile device could take considerably longer and 
5 become very tedious and prone to error. 

[0007] Recent advances in voice recognition (VR) 
technology and increases in hardware capabilities are 
making the development of voice recognition based us- 
er interfaces for desktop systems commercially viable. 
10 VR technology takes spoken words and translates them 
Into a format, which can easily be manipulated and dis- 
played by digital systems. There have been efforts to 
equip compact mobile devices with VR technology how- 
ever, these efforts have generally required costly device 
15 modifications such as extra components (e.g. a DSP 
chip) or increased processing and storage capability A 
typical cellular phone has computational resources 
equivalent to less than one percent of what is provided 
in a typical desktop or portable computer. A phone of 
20 this type running a scaled down VR application would 
only be able to recognize a small-predefined group of 
spoken words without modifying the device compo- 
nents. 

[0008] Voice recognition software cun^ently available 

25 for desktop and laptop computers (e.g. Naturally Speak- 
ing from Dragon System, Inc., PlainTalk™ from Apple 
Computer. ViaVoice 98™ from IBM and FreeSpeech 
98™ from Philips Talk) generally costs between $39.00 
and several hundred dollars per license. This would rep- 

30 resent a significant portion of the costs of a mobile de- 
vice equipped with a comparable software application. 
[0009] Placing a voice recognition software applica- 
tion in each mobile device and modifying its hardware 
components to run that application creates a financial 

35 disincentive for the handset manufacturers to incorpo- 
rate VR features in their devices. These modifications 
would add considerable cost to the final price of the mo- 
bile device, possibly pricing them out of the target price 
range (e.g. $150.00) usually occupied by mass-market 

40 mobile devices (e.g. cellular telephones). 

[0010] In terms of hardware resources, these applica- 
tions can require up to 60 Mbytes of memory for each 
language supported. Additionally most of the commer- 
cially available voice recognition software applications 

45 are designed to function on systems having relatively 
fast processors (e.g. 133 MHZ Pentium processor). 
[001 1] There Is thus a great need for apparatuses and 
methods that enable mobile devices to interact in a more 
efficient manner with digital computer networks. The 

50 ability to utilize voice recognition services in conjunction 
with the standard mobile device user interface (e.g. a 
phone keypad), without having to significantly modify 
hardware resources or costs, would dramatically im- 
prove the usability and commercial viability of network 

55 capable mobile devices having limited resources. 



BACKGROUND OF THE INVENTION 
Field of Invention 

[0001] This invention generally relates to data com- 
munications, and in particular to a two-way wireless 
communication device that utilizes network based voice 
recognition resources to augment the local user inter- 
face. 

Discussion of Related Art 

[0002] The use of hypertext based technologies has 
spread to the domain of wireless communication sys- 
tems. Two-way wireless communication devices, also 
described as mobile devices herein, and wireless net- 
work protocols have been designed to permit interactive 
access to remote information services (e.g. commercial 
databases, email, on-line shopping), through a variety 
of wireless and wire-line networks, most notably the In- 
ternet and private networks. 

[0003] Many mobile devices (e.g. cellular telephones) 
are mass-market consumer oriented-devices. Their us- 
er Interface should thus be simple and easy to use with- 
out limiting the functionality of the device. Currently, the 
primary method of data entry for most mobile devices is 
a keypad that Is relatively inefficient when used to input 
lengthy alphanumeric character strings. Due to size 
constraints and cost considerations, the keypads of 
these mobile devices are not a particularly user friendly 
interface for drafting messages requiring substantial us- 
er input (e.g. email messages). Keypads of this type 
usually have between 12 and 24 keys, a sufficient 
number for numeric inputs but very inefficient when 
dealing with the alphanumeric data entries required for 
network capable devices. 

[0004] A user requesting Information from the Internet 
generally navigates the World Wide Web using a brows- 
er. For example, a user requesting information on Stan- 
ford University using Infoseek™ as the search engine 
would have to input the following string: 

"http://www.infoseek.com" followed by "Stanford 
University" 

[0005] The search string listed above Includes over 
40 characters. A user would have no problem inputting 
a string of this type using a standard desktop computer 
keyboard and browser (e.g. Netscape or Explorer). 
However, the same user operating the keypad of a mo- 
bile device to input the same string would be severely 
hampered by the compact keypad and the close spacing 
between the keys. 

[0006] One of the common uses of the Internet is 
email. A user who desires to send an email message 
having the size of the paragraph above would have to 
input over 400 characters. Using the standard keyboard 
of a desktop computer, a user may be able to Input that 
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[0012] The present invention relates to a wireless 
communication system that utilizes a remote voice rec- 
ognition server system to translate voice input received 
from mobile devices into a symbolic data file (e.g. alpha- 
numeric or control characters) that can be processed by 
the mobile devices. The translation process begins by 
establishing a voice communication channel between a 
mobile device and the voice recognition server. A user 
of the mobile device then begins speaking in a fashion 
that may be detected by the voice recognition server 
system. Upon detecting the user's speech, the voice 
recognition server system translates the speech into a 
symbolic data file, which is then forwarded to the user 
through a separate data communication channel. The 
user, upon receiving the symbolic data file at the mobile 
device, reviews and edits the content of the symbolic 
data file and further utilizes the file as desired. For ex- 
ample a user could use the symbolic data file to fill in 
fields in an email or a browser request field. 
[0013] The invention can be implemented in numer- 
ous ways, including as a method, an apparatus or de- 
vice, a user interface, a computer readable memory and 
a system. Several embodiments of the invention are dis- 
cussed below. 

[0014] According to one embodiment, the present in- 
vention is a method for obtaining voice recognition serv- 
ices for a mobile device not having the resources and/ 
or software for perfomning voice recognition processing 
locally. The method comprises using local applications 
resident within the mobile device to establish and coor- 
dinate a voice channel between the subject mobile de- 
vice and a remote server system running a voice recog- 
nition application (referred to herein as a voice recogni- 
tion server system). 

[0015] Upon establishment of the voice channel the 
user of the subject mobile device is queued to begin 
speaking into the microphone of the mobile device (e.g. 
a cellular phone). Voiced input received at the voice rec- 
ognition server system, as a result of this interaction, is 
converted into a symbolic data file. This process may 
be assisted by previously stored user specific data files. 
The symbolic data file is then forwarded back to the orig- 
inating mobile device or a designated third party device 
through a separately established and coordinated data 
communication channel. The symbolic data file may be 
used to interact with local applications on the mobile de- 
vice or to interact with network resources (e.g. servers 
on the Internet or a private network). 
[0016] Other objects and advantages, together with 
the foregoing are attained in the exercise of the inven- 
tion in the following description and accompanying 
drawings. 

BRIEF DESCRIPTION OF DRAWINGS 

[0017] The present invention will be readily under- 
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stood by thefollowln^^Hled description in conjunction 
with the accompanyii^Kwings, wherein the reference 
numerals illustrate the structural elements, and in which: 

Figure 1 Illustrates a schematic configuration in 
which the present invention may be practised; 
Figure 2A depicts display and user interface com- 
ponents of a typical voice capable mobile device; 
Figure 2B illustrates a functional block diagram of 
an exemplary voice capable mobile device; 
Figure 3 illustrates a functional block diagram of a 
link server device according to a preferred embod- 
iment of the present invention; 
Figure 4 is a schematic diagram showing exemplary 
processing stages for the voice recognition server 
in accordance with an exemplary embodiment of 
the present invention; 

Figure 5 shows representative screen displays, 
which illustrate operations relating to the interaction 
of a mobile device with a voice recognition server 
system; 

Figure 6 illustrates a process flowchart from the per- 
spective of the mobile device according to an em- 
bodiment the present invention; and 
Figure 7 illustrates a process flowchart from the per- 
spective of the voice recognition server according 
to an embodiment the present Invention. 

DETAILED DESCRIPTION OF THE INVENTION 



[0018] In the following detailed description of the 
present invention, numerous specific details are set 
forth in order to provide a thorough understanding of the 
present invention. However, it will become obvious to 

35 those skilled in the art that the present invention may be 
practised without these specific details. In other instanc- 
es, well known methods, procedures, components, and 
circuitry have not been described in detail to avoid un- 
necessarily obscuring aspects of the present invention. 

40 The detailed description of the present invention in the 
following are presented largely in terms of procedures, 
steps, logic blocks, processing, and other symbolic rep- 
resentations that resemble data processing devices 
coupled to networks. These process descriptions and 

45 representations are the means used by those experi- 
enced or skilled in the art to most effectively convey the 
substance of their work to others skilled In the art. 
[0019] The invention pertains to systems and meth- 
ods, which enable a mobile device to access voice rec- 

50 ognition services from a networked voice recognition 
server system. According to one embodiment of the 
present invention, voice recognition services are ac- 
cessed by establishing a voice channel between the us- 
er of a mobile device desiring voice recognition services 

55 and a networked voice recognition server system. 
[0020] Once a voice channel is established, the user 
of the mobile device is given a queue to begin speaking 
when the voice recognition server system is ready to re- 
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ceive a speech signal. The^^^ived speech signal is 
processed by the voice rec(^BRn server system using 
voice recognition techniques well known in the art (e.g. 
template matching, Fourier transfomfis or linear predic- 
tive coding (LPC)) and a symbolic data file is generated. 
[0021] A symbolic data file is a file containing a plu- 
rality of letters, phonemes, words, figures, objects, func- 
tions, control characters or other conventional marks 
designating an object, quantity, operation, function, pho- 
neme, word, phrase or any combination thereof having 
some relationship to the received speech signal as in- 
terpreted by the voice recognition system. Voice recog- 
nition systems generally use voice templates, Fourier 
Transform coding, or a linear predictive coding scheme 
to map the voiced input components to pre-stored sym- 
bolic building blocks. Examples of symbolic data files 
include ASCII files and binary data files. 
[0022] To facilitate a description of the present inven- 
tion, it is useful to recite some of the features of a com- 
munication system in which the invention may be prac- 
tised. Figures 1 through 4 provide an overview of the 
principal system components. 
[0023] Referring to Figure 1 a block diagram of a typ- 
ical communication system according to one embodi- 
ment of the present invention is displayed. Mobile de- 
vices 102 and 103 receive phone calls through a voice 
communication channel and hypermedia information (e. 
g. Hyper Text Markup Language (HTML) documents, 
Compact Hypertext Transport Protocol (cHTML) docu- 
ments, Extensible Markup Language (XML) documents, 
Handheld Device Markup Language (HDML) docu- 
ments, or Wireless Markup Language (WML) docu- 
ments, or other similar data types) from remote server 
devices through broad-band and narrow-band (e.g. 
SMS) data communication channels which may include 
link server device 106 and Short Message Service cent- 
er (SMSG) 107. 

[0024] Mobile devices 102 and 103 each have a dis- 
play and a user interface. Additionally, mobile devices 

102 and 103 may have a micro-browsers (e.g. a micro- 
browser from Phone.com, Inc. 800 Chesapeake Drive, 
Redwood City. CA, 94063) stored in a local memory (al- 
so referred to as a client module) which enables the de- 
vice to process hypermedia information received from 
remote server devices. 

[0025] As shown in Figure 1 , mobile devices 102 and 

103 may be coupled to link server device 106 through 
a wireless carrier network 104 (also referred to herein 
as a wireless network). Mobile devices 1 02 and 1 03 may 
be taken from a group, which includes mobile phones, 
palm sized computing devices and personal digital as- 
sistants with voice transmission and/or reception capa- 
bilities. Voice capabilities are defined as the capabilities 
equipped in a mobile device that allow a user to com- 
municate voice based information to and from remote 
destinations (e.g. to another user or a device). 
[0026] Access to the voice communication channel 
generally requires that the user and/or device be recog- 



nized by wireless ca^^Bietwork 104. Network recog- 
nition Involves the exSl^e of identification information 
between a subject mobile device and wireless carrier 
network 104. Generally, the identification information for 

5 the user and/or mobile device in question is stored in 
the memory of the device and is transmitted automati- 
cally when the user attempts to access the network. 
[0027] Wireless carrier network 1 04 may be any of the 
well known wireless communication networks (e.g. cel- 

10 lular digital packet data (CDPD) network. Global System 
for Mobile Communication (GSM) network, Code Divi- 
sion Multiple Access (CDMA) network, Personal Handy 
Phone System (PHS) or Time Division Multiple Access 
(TDMA) network). Link server device 106 is further cou- 

15 pled to a wired network 1 08 to which a voice recognition 
server system 109 and a plurality of networked servers 
represented by network server 113 are coupled. 
[0028] Voice recognition server system 109 is com- 
prised of a server device 110 and storage facilities 112 

20 capable of storing, among other things, user specific 
files associated with a plurality of user's serviced by a 
carrier entity. The user specific files are utilized in con- 
junction with voice recognition processing and in one 
embodiment are part of the present invention. 

25 [0029] Examples of user specific files might include 
user specific speech templates, one or more user spec- 
ified language dictionaries (e.g. French. English, Ger- 
man or Cantonese) and one or more user specific dic- 
tionaries or lists of an individual user's frequently used 

30 words. These files may be uploaded and managed us- 
ing a networked multimedia computer (e.g. multimedia 
computer 140) or through the user interface of the serv- 
iced mobile device. For example, voice templates are 
generated by having the user read a pre-determined 

35 script into a voice-enabled device. User preferences (e. 
g. languages of choice) may be input using menu selec- 
tion screens presented to the user on the display of the 
mobile device or another device connected to the voice 
recognition server system via a wired network. 

40 [0030] For simplicity, antenna 121 represents a wire- 
less carrier infrastructure that generally comprises a 
base station and an operations and maintenance center. 
The base station controls radio or telecommunication 
links with mobile devices 102 and 103. The operations 

45 and maintenance center comprises a mobile switching 
center, which switches calls between the mobile devices 
and other fixed or mobile network users. Further the op- 
erations and maintenance center manages mobile ac- 
count services, such as authentication, and oversees 

50 the proper operation and setup of the wireless network. 
Each of the hardware components and processes in 
carrier infrastructure 121 are known to those skilled in 
the art and will not be described herein to avoid unnec- 
essarily obscuring aspects of the present invention. 

55 [0031] The communication protocols used by airnet 
104 may, for example, be Wireless Access Protocol 
(WAP) or Handheld Device Transport Protocol (HDTP). 
Wired network 108 is a land-based network that may be 
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the Internet, a private netw^^^a data network of any 
private network. Typically tUBmmunication protocol 
supporting landnet 118 may be Transmission Control 
Protocol (TCP/IP), Hypertext Transport Protocol (HT- 
TP), or Secure Hypertext Transport Protocol (sHTTP). . 
[0032] Link server device 1 06 and network server 1 1 3 
are typically computer work stations, for example, a 
SPARC station from Sun Microsystems Inc. (http://www. 
sun.com) with networking libraries and Internet connec- 
tivity. Network server 113 is representative of a plurality 
of networked servers coupled to landnet 108 and is ca- 
pable of providing access to hypermedia information in- 
cluding information for mobile devices 102 and 103. 
[0033] Link server device 106 is depicted as a stand 
alone device and therefore is often referred to as a net- 
work gateway or wireless data server. Link server 106 
can be configured to operate as a bridge between wire- 
less network 104 and wired network 108. It should be 
pointed out that the functions of link server device 106 
may be performed by other server devices connected 
to wired network 108 with hardware well known in the 
art providing the connection between wireless network 
104 and wired network 108. 

[0034] The voice communication channel previously 
described is generally represented by voice channel 
126. This communication channel is generally estab- 
lished and coordinated using the infrastructure and pro- 
cedures generally known in the art for setting up a phone 
call. 

[0035] There are generally two types of data commu- 
nication channels providing service to mobile devices 
102 and 103. Data communication channel 128 is rep- 
resentative of a wideband data communication channel. 
Data communication channel 130 is representative of a 
narrowband data communication channel e.g. (a Short 
Message Communication (SMS) service channel). Ei- 
ther of these data communication paths can be used to 
deliver data to and from mobile devices 102 and 103. 
[0036] According to the preferred embodiment of the 
present invention a mobile device (e.g. mobile device 
102 or 103) desinng to receive voice recognition serv- 
ices from voice recognition server system 109, first es- 
tablishes a voice channel generally represented by 
voice channel 126. The contact information for voice 
recognition server system 109 (e.g. a phone number or 
a uniform resource indicator (URI)) may be embedded 
in software loaded on the mobile device, retrieved from 
link server device 106 or input by the user directly. 
[0037] Once a voice channel Is established between 
the requesting mobile device and voice recognition 
server system 109, user information is forwarded to the 
voice recognition server system. This allows previously 
stored user specific files for the requesting mobile de- 
vice to be accessed and utilized. The user information 
may be transmitted on a separate data communication 
channel (e.g . data communication channels 1 28 or 1 30) 
or input by the user. The user specific files generally pro- 
vide for features specific to a particular user account. 



For example, the usq^^specify one or more languag- 
es of choice for voic^iRgnition processing. 
[0038] Once the user specific files for the subject mo- 
bile device/user are retrieved, the user is prompted to 
5 provide a voiced input (e.g. begin speaking). It is Impor- 
tant to note at this point that the user may utilize the user 
Interface of the mobile device (e.g. a phone keypad) 
while utilizing voice recognition services. When the user 
has completed their input interaction (voice and physical 
10 input) with the mobile device, an indication may be pro- 
vided by the user (voiced or key input) to conclude the 
input session. Voice recognition server system 109 then 
converts the voiced input into a symbolic data file, which 
can be fonn/arded to the requesting mobile device via 
15 link server 106. 

[0039] As previously stated, the symbolic data file is 
a file containing a plurality of letters, phonemes, words, 
figures, objects, functions, control characters or other 
conventional marks designating an object, quantity, op- 
20 oration, function, phoneme, word, phrase or any combi- 
nation thereof having some relationship to the received 
speech signal as interpreted by the voice recognition 
system. Voice recognition systems generally use voice 
templates. Fourier Transform coding, or a linear predic- 
25 tive coding scheme to map the voiced input components 
to pre-stored symbolic building blocks. Examples of 
symbolic data files Include ASCII files and binary data 
files. 

[0040] The symbolic data file may initially be forward- 

30 ed to link server device 106. which may perform addi- 
tional processing prior to forwarding the symbolic data 
file to the requesting mobile device via wideband chan- 
nel 1 28 or narrowband channel 1 30. The user of the mo- 
bile device may then review the received symbolic data 

35 file and utilize it as desired. 

[0041 ] The accuracy of the voice recognition applica- 
tion used by voice recognition server system 109 will 
vary depending on the translation methodology used 
and the size and language of the language dictionaries 

40 used. Generally, speaker dependent methodologies (e. 
g. template matching) have accuracy's as high as 98 
percent and speaker-independent methodologies (e.g. 
Fourier transforms and linear predictive coding (LPC)) 
have accuracy's in the range of 90 to 95 percent (www. 

45 hitl.washington.edu -Voice Recognition, Jim Baumann). 
[0042] In accordance with the principles of the present 
invention, users of mobile devices (e.g. mobile devices 
102 and 103) may access voice recognition services on 
those mobile devices without the significant hardware 

50 or software modifications that might be required if the 
voice recognition application were executed by the de- 
vice. Additionally, since the software performing voice 
recognition processing is resident on an accessible re- 
mote server device with superior processing speed (as 

55 compared to that of the mobile device) and large storage 
capacity, the user of the device can be provided with the 
functionality and resources associated with a fijil fea- 
tured voice recognition application. For example, the 
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voice recognition applicatio^^^ have access to large 
language dictionaries, seledlBi language dictionaries 
for multiple languages and user specific files (e.g. voice 
templates and user customized dictionaries and lists). 
[0043] Figure 2A depicts an exemplary mobile device 
200 that may correspond to one of mobile devices (102 
or 103) in Figure 1 . Mobile device 200 includes a display 
screen 204, an extended phone-styled keypad 210, cur- 
sor navigation keys 222 and 224, a pair of softkeys 208A 
and 208B, an earpiece 21 2A and a microphone 21 2B. 
Screen display 204 is typically a Liquid Crystal Display 
(LCD) screen capable of displaying textual information 
and certain graphics. Extended phone keypad 210 in- 
cludes, preferably, a regular phone keypad with addi- 
tional keys providing additional characters (e.g. a 
space) and functions (e.g. back or clear). 
[0044] Cursor navigation keys 222 and 224 allow a 
user to reposition a cursor or an element indicator 21 6, 
for example, to activate one of the applications dis- 
played on screen display 204. Generic keys 208A and 
208B are typically used to perform application specific 
functions as indicated by softkey function identifiers 214 
and 215. It should be understood, by those having ordi- 
nary skill in the art, that having a regular phone keypad 
is not a requirement to practice the present invention. 
Some mobile devices sometimes have no physical keys 
at all, such as those palm-sized computing devices that 
use soft keys or icons as an input mechanism. 
[0045] Upon establishing a communication session 
with an associated link server device (e.g., link server 
device 106 of Figure 1) mobile device 200 typically re- 
ceives one or more markup language card decks to as- 
sist the user with device interactions. Depending on the 
implementation preference, the markup language card 
decks, alternatively referred to as screen descriptive 
commands files, may be in a markup language that In- 
cludes, but is not limited to, Handheld Device Markup 
Language (HDML), Hypertext Markup Language 
(HTML), compact HTML, Wireless Markup Language 
(WML), Standard Generalized Markup Language 
(SGML) or Extensible Markup Language (XML). Alter- 
natively, the data file may be a stripped, compressed, 
compiled or converted version of a corresponding 
markup file. 

[0046] The text appearing on LCD screen 204 in Fig- 
ure 2A is an example of such a display screen. In this 
example the user is offered a choice of the following se- 
lections: 

1) Bookmarks 

2) Search InL 

3) Email 

4) News 

Each of the selections is typically linked to a resource 
on the network or to a local software application. A user 
may make a selection from the menu above using nav- 
igation keys 222 and 224 with the user's selection indi- 



cated by element ind^^216. This same method may 
be utilized to provid^K prompts for interacting with 
remote server devices (e.g. voice recognition server 
system 109 of Figure 1). 
5 [0047] Referring now to Figure 2B, a more detailed 
description of mobile device 250, which may be mobile 
devices 102 or 103 of Figure 1 and 200 of Figure 2A. is 
provided. Mobile device 250 includes a Wireless Control 
Protocol (WCP) interface 252 that couples to a carrier 
10 wireless network 104 to receive incoming and outgoing 
signals. Device identifier (ID) storage 254 stores and 
supplies a device ID to WCP interface 252 for the pur- 
pose of identifying mobile device 250 to outside entities 
(e.g. link server device 106 of Figure 1). The device ID 
15 is a specific code that is associated with mobile device 
250 and directly corresponds to the device ID in an as- 
sociated user account typically provided in an associat- 
ed link server device (e.g. 106 of Figure 1). 
[0048] Mobile device 250 includes a processor 268, 
20 encoder/decoder circuitry 264. working memory 258 
and a client module 256. Client module 256 is represent- 
ative of software components loaded on or into device 
memory resources, which performs many of the 
processing tasks performed by mobile device 250 in- 
25 eluding; establishing a communication session with a 
link server device via wireless carrier network 104. op- 
erating and maintaining local applications, displaying in- 
formation on a display screen 260 of mobile device 250, 
and receiving user input from keypad 262. Client module 
30 256 may be loaded into the memory of mobile device 
250 in much the same fashion as software is loaded on 
a computing device. 

[0049] In addition, mobile device 250 includes voice 
circuitry 266 for converting voice activity to electrical im- 
35 pulses which may be transmitted and received on digital 
and analog communication systems. These compo- 
nents and their functions are well known in the art and 
will not be discussed further. 

[0050] In accordance with the principles of the present 

40 invention, the software loaded on mobile device 200 in- 
cludes a component, which provides assistance to the 
user relating to interactions with the server device run- 
ning the voice recognition application. The software pro- 
viding this assistance may be loaded as part of the mi- 

45 crobrowser or other application, or as a stand alone ap- 
plication. This application may be responsible for tasks 
such as retrieving and storing contact information for 
server devices providing services, management of re- 
ceived symbolic data files, and input/alteration of user 

50 preferences. User assistance may be in the form of 
screen displayed information, audible or tactile prompts 
and/or softkey mapped functions, for example. 
[0051] For example, a user desiring to utilize voice 
recognition services in conjunction with an application 

55 (e.g. an email message) may access the application of 
interest and activate a softkey to access voice recogni- 
tion services. The function associated with the softkey 
would then retrieve the contact information for the server 
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device running the voice re^^ition application, if not 
already stored, and the prol^^would proceed as de- 
scribed above. This example is provided for purposes 
of illustration and should not be interpreted as limiting 
the scope of the present invention. 
[0052] Figure 3 schematically Illustrates the principle 
components of link server device 340, which may cor- 
respond to link server device 1 06 of Figure 1 . Link server 
device 340 is a server computer that operates as a net- 
work gateway between wired network 300 and wireless 
network 320. To avoid obscuring the principle aspects 
of the present invention, well-known methods, proce- 
dures, components and circuitry in link server device 
340 are not described in detail. 
[0053] Link server device 340 includes a Land Control 
Protocol (LCP) interface 358 that couples to wired net- 
work 300, and a Wireless Control Protocol (WCP) inter- 
face 341 that couples to wireless network 320. A server 
module 310 is coupled between the LCP interface 358 
and the WCP interface 341 . 

[0054] Server module 31 0 performs traditional server 
processing as well as protocol conversion processing 
from one communication protocol to another communi- 
cation protocol. Message processor 316 is the compo- 
nent responsible for protocol conversions and associat- 
ed tasks. In the case of protocol conversions (e.g. be- 
tween HDTP and HTTP), the conversion is generally a 
data mapping process. It will be understood by those 
skilled in the art that WCP interface 341 can be replaced 
by other interface modules depending on the wireless 
networks and protocols used. The same is true of LCP 
interface 358 when the type of wired network and pro- 
tocol vary. 

[0055] Server module 310 also includes an account 
manager 312 and an account interface 314. Account 
manager 312 manages a plurality of user accounts, typ- 
ically one for each of the mobile devices serviced by link 
server device 340. It should be understood that the user 
account information may be stored in another network 
server coupled the link server device 340. In other 
words, the user accounts can be kept in a database that 
is physically placed in any computing device coupled to 
link server device 340 via a wired network. 
[0056] Each of the mobile devices serviced by link 
server device 340 is assigned an identification (ID) or 
device ID. A device ID can be a phone number of the 
device or an IP address or a combination of an IP ad- 
dress and a port number, for example: 
204.163.165.132:01905 where 204.163.165.132 Is the 
IP address and 01905 is the port number. The device 
ID is further associated with a subscriber ID created and 
administrated by the carrier controlling link server de- 
vice 340 as part of the procedures involved in activating 
a subscriber account for mobile device. The subscriber 
ID may be associated with, and utilized to, access the 
user specific files (e.g. 112 of Figure 1) associated with 
a particular user or device. 

[0057] The subscriber ID may take the form of, for ex- 



ample, 861234567-^B_pn.mobile.att.net by AT&T 
Wireless Service, anSw unique identification to a mo- 
bile device. The account manager 31 2 is responsible for 
creating a user account for a mobile device that allows 

5 for secure communications with link server device 340. 
In this case, account manager 312 ensures the proper 
level secure access for the serviced mobile device to 
services provided by link server device 340. 
[0058] Link server device 340 also includes a proces- 

10 sor 318 and storage resource 320 as the primary hard- 
ware components. Processor 318 performs operations 
under the control of the server module 310. It will be 
understood to those skilled in the art that link server de- 
vice 340 may include one or more processors (e.g., 

15 processor 318), working memory (e.g., storage re- 
source 320), buses, interfaces, and other components 
and that server module 31 0 represents one or more soft- 
ware modules loaded into the working memory of link 
server device 340 to perform designated functions. The 

20 same distinction is equally applicable to the client mod- 
ule and hardware components of the subject mobile de- 
vices. 

[0059] Typically the landnet communication protocol 
(LCP) supported in landnet 300 may include Transmis- 

25 sion Control Protocol (TCP), HyperText Transfer Proto- 
col (HTTP) or Secure HyperText Transfer Protocol (HT- 
TPS), and the wireless communication protocol (WCP) 
may include (TCP). (HTTP) or (HTTPS), Handheld De- 
vice Transport Protocol (HDTP) or Wireless Session 

30 Protocol (WSP). In the case that LCP is different from 
WCP, server module 310 includes a mapping module (i. 
e. a mapper) responsible for mapping from one protocol 
to another so that a mobile device coupled to wireless 
network 320 can communicate with a device coupled to 

35 wired network 300. 

[0060] Once the received speech signal is processed 
by voice recognition server system (not shown), a sym- 
bolic data file is generated and forwarded to link server 
device 340. The symbolic data file is received by mes- 

40 sage processor 316 via LCP interface 358. Message 
processor 316 converts the symbolic data file to a data 
format that may be optimally (in terms of the protocol 
requirements of the wireless network and the device 
characteristics of the requesting mobile device) trans- 

45 ported on wireless network 320. The symbolic data file 
can be in a format comprehensible by message proces- 
sor 316, for example, in a markup language (e.g. HTML) 
or a text file (e.g. ASCII), when received from the voice 
recognition server system. The processed symbolic da- 

50 ta file, which may be reformatted so as to be more com- 
patible with the requesting mobile device, is then for- 
warded to the requesting mobile device or to a desig- 
nated third party device. 

[0061] Referring to Figure 4, there are shown func- 
55 tional modules of an exemplary voice recognition server 
system 460 (which may correspond voice recognition 
server system 109 of Figure 1 ) that performs the follow- 
ing processes: 1) Speech Detection, 2) Speech Analy- 
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sis, 3) Pattern Matching ar^^BymbolIc File Genera- 
tion. During speech detect^iPr62, Voice recognition 
server system 460 detects the presence of a speech sig- 
nal at Its input. Upon detection, the received speech sig- 
nal goes through the speech analysis process 464, 5 
where it is reduced to quantifiable indices suitable for 
pattern matching. During the pattern matching stage 
466, the quantifiable indices are compared to user voice 
templates (if using a template based voice recognition 
process) stored in storage device 480 that may include 
various language dictionaries and a plurality of user spe- 
cific files. The symbolic data file is forwarded to link serv- 
er device 340 via wired network 300 (see Figure 3) as 
previously described. It will be understood, by persons 
of ordinary sl<ill in the art, that other voice recognition i5 
schemes (e.g. Fourier transforms or linear predictive 
coding (LPC)) may be used without deviating from the 
scope of the invention. Persons of ordinary skill in the 
art will also understand that the link server device (e.g. 
1 06 of Figure 1 ) may perform the functions of the voice 20 
recognition server system (e.g. 109 of Figure 1). 
[0062] Figure 5 illustrates a plurality of exemplary 
screen displays relating to the interaction of a mobile 
device requesting voice recognition services and a 
voice recognition server system. Initially screen display 25 
500 allows a user to select between manual entry 504 
and VR (voice recognition) assisted entry 508. User se- 
lections are indicated by selection indicator 512. In this 
example, VR assisted entry 512 may be selected by ac- 
tivating the softkey associated with soflkey function 30 
identifier 516. This selection retrieves the contact infor- 
mation for the voice recognition server system providing 
service. In this example the contact information is com- 
prised of a phone number (e.g. 650-555-7272). One of 
ordinary skill in the art will understand that the contact 35 
information may also be comprised of a Uniform Re- 
source Identifier (URI) or similar unique identifier. Asso- 
ciated user and/or device identification information, uti- 
lized for accessing user specific files, may be transmit- 
ted in the background (e.g. using a separate data com- 40 
munication channel or the voice communication chan- 
nel) or input by the user. 

[0063] Upon retrieval of the voice recognition server 
system contact information 522, as shown in screen dis- 
play 520, a voice channel may be established by acti- ^5 
vating the softkey associated with softkey function iden- 
tifier 524 ("OK"). Screen display 530 illustrates types of 
information, which could be provided to the user of the 
requesting mobile device. Character string 532 provides 
the user with information relating to the status of estab- so 
lishing a communication session with the voice recog- 
nition server system providing service. Character string 
534 provides the user with information relating to the 
settings utilized to process the user's request. This 
could be comprised of a simple character string (e.g. "In- 55 
itializing Default Settings") or a plurality of interactive 
and non-interactive displays which allow a user to input 
selections (e.g. a language of choice). When the serv- 



icing voice recognitio^Brer system is ready to receive 
input a prompt 536 (ll^n speaking") is presented to 
the user. A user may end the Input session by activating 
the softkey associated with softkey function identifier 
538. 

[0064] Voice recognition services may be configured 
to interact with particular applications resident on the re- 
questing mobile device. For example, processed sym- 
bolic data files may be generated to serve as inputs for 
specific fields in an application such as an email. Addi- 
tionally, once an active voice channel has been estab- 
lished for voice recognition services, the user may 
change the application using the service without having 
to secure and re-establish the voice communication 
channel. For example, the user may switch between an 
email program and a personal organizer. This feature 
reduces user cost and network congestion. 
[0065] Referring now to Figure 6 and Figure 7, there 
are respectively illustrated process flowcharts that de- 
scribe the operations of the mobile device and the voice 
recognition server according to one embodiment of the 
present invention. Both Figure 6 and Figure 7 should be 
understood in conjunction with Figure 1. 
[0066] In accordance with the prefen-ed embodiment 
of the present invention, a user desiring voice recogni- 
tion services would initiate a request for the service by 
using the local user interface (e.g. by pressing a key). 
Generally the user would do so in conjunction with a de- 
sired task being performed using resident applications 
(e.g. email or web browsing). Information returned to the 
mobile device as a result of the request may be Incor- 
porated within a document associated with the task be- 
ing perfonfTied. 

[0067] The request process causes a voice channel 
to be established between the mobile device requesting 
service and the voice recognition server system provid- 
ing the service. Once the voice channel is established 
and the user is queued to begin speaking, the user may 
begin an input interaction with the mobile device which 
may include physical input using the local user interface 
(e.g. a phone keypad) in addition to the voiced input. 
Upon completion of the initial input interaction with the 
mobile device the user may choose to maintain the open 
status of the voice channel open and perform another 
task or terminate the voice channel. 
[0068] Figure 6 is a flow diagram, which illustrates the 
process 600 utilized by a mobile device (e.g. mobile de- 
vices 102 and 103) to interact with a remote voice rec- 
ognition server system (e.g. voice recognition server 
system 109) from the perspective of the mobile device. 
At 604 a determination is made as to whether there is 
an active voice channel between the subject mobile de- 
vice and the voice recognition server system providing 
services. This process usually occurs in the background 
under software control. 

[0069] If there is an active voice channel, then the us- 
er is prompted to provide an input at 608 indicating 
whether the user desires the active voice channel to be 
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disabled. This would be thq^^^ where the user does 
not require VR services for tMPranned input interaction 
with the mobile device. 

[0070] If the user decides to disable the voice channel 
then it is disabled at 612. The user then proceeds with 
physical input 628 using the device's user interface (e. 
g. the keypad). At 622 a decision is made as to whether 
user input (e.g. physical input 628) has been registered 
(e.g. input accepted by the device). If the user input is 
registered then it processed at 632 and the user Is 
prompted to provide an input at 636 Indicating whether 
to continue the input session or terminate It. If the user 
selects termination then a detemnination Is made as to 
the status of established voice channels/circuits at 640 
(i.e. is the voice channel/circuit active). As was previ- 
ously described, this check usually occurs in the back- 
ground. In the sequence described above there is not 
an active voice channel so the process would be termi- 
nated. 

[0071] If at 608 the user decides not to disable the 
voice channel, as would be the case where a user in- 
tends to utilize voice recognition services for a mobile 
device input interaction, then the user provides voice in- 
put 624 and physical input 628 and a determination is 
made at 622 as to whether the user's input has been 
registered. If the user input's has been registered then 
it processed at 632 and the user is prompted to provide 
an indication at 636 as to whether the user desires to 
continue the Input session or terminate it. If the user se- 
lects termination then a determination is made as to the 
status of any established voice channels/circuits at 640. 
Upon termination, active voice channels/circuits are se- 
cured. The process is then terminated. 
[0072] If the user decides not to terminate the input 
session at 636 then the process returns to the beginning 
of process 600. 

[0073] If a determination is made at 604 that there is 
not an active voice channel, then the user is prompted 
to provide an indication at 614 as to whether the user 
desires an active voice channel to be established. This 
would be the case where the user requires voice recog- 
nition services for an input interaction with the mobile 
device. 

[0074] If at 614 the user requests a voice channel for 
the input interaction then one is established at 618. The 
user then provides voice input 624 and physical Input 
628 and a determination is made at 622 as to whether 
the user's input has been registered. If the user's input 
has been registered then it is processed at 632 and the 
user is prompted to provide an indication at 636 whether 
to continue the input session or terminate it. If the user 
selects termination then a determination Is made, as 
previously described as to the status of any established 
voice channels/circuits at 640. Upon termination, active 
voice channels/circuits are secured. The process is then 
terminated. 

[0075] If at 614 the user does not request a voice 
channel for the Impending input interaction, as would be 



the case where the i^Moes not require voice recog- 
nition services, the uslPmen proceeds with physical in- 
put 628 using the mobile device user interface (e.g. the 
keypad). At 622 a determination is made as to whether 
5 user Input (e.g. physical input 628) has been registered. 
If the user input has been registered then it processed 
at 632 and a decision is made at 636 whether to continue 
the input session or terminate it. The process is then 
terminated. 

10 [0076] If the user decides not to terminate the input 
session at 636 then the process returns to the beginning 
of process 600. 

[0077] If in any of these exemplary Interactions de- 
scribed above, the user input is not registered at 622, 

15 then the user is prompted to provide an indication at 636 
as to whether or not they desire to terminate the session 
with the voice recognition server system. 
[0078] Once a voice channel between the voice rec- 
ognition system providing service and a mobile device 

20 requesting service is established, the voice recognition 
server system may retrieve any user specific files asso- 
ciated with the user of the mobile device (e.g. language 
preferences, template files etc.) and use these to proc- 
ess the incoming voice input. The voice recognition 

25 server system then detects and processes incoming 
voice signals associated with the request for service. 
The Incoming voice signal is converted into a symbolic 
data file using a template matching process, Fourier 
transform method, linear predictive coding scheme or 

30 any suitable voice recognition coding scheme and for- 
warded to the requesting mobile device (or a designated 
third party device) using a data communication channel 
that may Include an intenmediate server device (e.g. link 
server device 106 of Figure 1). 

35 [0079] The symbolic data file may be in a format that 
is suitable for processing by the requesting mobile de- 
vice (e.g. cHTML, WML or HDML) or may be in a format 
suitable for processing by an intermediate server device 
(e.g. HTML, WML. XML. ASCII etc.). In the latter case 

40 the intermediate server device may perform any conver- 
sion process required if any. 

[0080] According to the principles of the present in- 
vention, a user interacting with a mobile device would 
be able to access remotely available voice recognition 

45 services based in a server device running a voice rec- 
ognition application (e.g. a voice recognition server sys- 
tem). Software stored on the phone (e.g. a microbrows- 
er) assists the user in this interaction by retrieving and 
managing contact Information for the server device and 

50 by providing prompts and performing functions related 
to interactions with the voice recognition server system. 
Using this system and method, mobile devices having 
limited processing and storage capability have access 
to full featured voice recognition applications running on 

55 powerful computer workstations. 

[0081] Figure 7 Is a flow diagram, which illustrates the 
process 700 utilized by a voice recognition server sys- 
tem (e.g. voice recognition server system 109) to inter- 
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act with a mobile device (e^^^bile device 102) from 
the perspective of the voiceiB^nition server system. 
At 704 a determination is made (i.e. by a software proc- 
ess) as to whether a voice circuit/channel has been es- 
tablished between the voice recognition server system 
(e.g. voice recognition server system 109) and a mobile 
device requesting services (e.g. mobile device 102). 
[0082] If at 704 it is determined that a voice circuit/ 
channel has been established with a mobile device re- 
questing services then another determination is made 
at 708 as to whether a speech signal has been detected. 
If a speech signal is detected at 708 the received speech 
input 716 is utilized to generate a symbolic data file at 
712. 

[0083] As previously stated, the symbolic data file is 
a file containing a plurality of letters, phonemes, words, 
figures, objects, functions, control characters or other 
conventional marks designating an object, quantity, op- 
eration, function, phoneme, word, phrase or any combi- 
nation thereof having some relationship to the received 
speech signal as interpreted by the speech recognition 
system. Voice recognition systems generally use voice 
templates, Fourier Transform coding* or a linear predic- 
tive coding scheme to map the voiced input components 
to pre-stored symbolic building blocks. Examples of 
symbolic data files include ASCII files and binary data 
files. 

[0084] The symbolic data file is then forwarded to the 
requesting mobile device (or designated third party de- 
vice) at 720. At 724 it is determined whether a termina- 
tion command has been received from the mobile de- 
vice requesting services. If a termination command is 
received then the process Is ended. If a termination 
command is not received then the process continues to 
look for an incoming speech signal at 708. If at 708 a 
speech signal is not received within a pre-determined 
time period then a determination is made at 728 as to 
whether a termination command has been received. If 
a termination command has been received then the 
process is terminated. Of course the system could have 
pre-determined time-outs or cycle limits that could result 
in process termination even if a termination command 
has not been received. 

[0085] If at 704 it is determined that a voice circuit/ 
channel has not been established with a mobile device 
requesting services then the voice recognition server 
system awaits the establishment of an active voice 
channel with a mobile device desiring voice recognition 
services. 

[0086] According to the principles of the present in- 
vention, the voice recognition server system functions 
as an extension of the user interface of the mobile de- 
vice. For example a user can choose to use voice rec- 
ognition sen/ices for lengthy interactions that would un- 
der normal circumstances require considerable time 
and effort to input using the local user Interface. In ad- 
dition, since the resources of the mobile device do not 
limit the voice recognition application used, the user can 
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be provided access t^^^st vocabulary. 
[0087] The advant^ir of the present invention are 
numerous. Different implementations may yield one or 
more of the following advantages. One advantage of the 
5 present Invention Is that users of certain mobile devices 
(e.g. devices with limited processing and storage capa- 
bility) are enabled to utilize a fully functional voice rec- 
ognition application running on a remote server device 
to augment the standard device user interface. 
10 [0088] Another advantage of the present invention is 
that since the voice recognition application utilized is not 
restricted by the processing and storage limitations of 
the mobile device, the user may be provided with the 
functionality of a full feature voice recognition applica- 
15 tion running on a more powerful computer. Advantages 
associated with this functionality include providing the 
user with multiple language dictionaries having large vo- 
cabularies and personalized dictionaries. Additionally, 
since the voice recognition application is not stored on 
20 the mobile device, there is little or no impact on the per- 
unit cost of the mobile device. Still another advantage 
of the present invention is that the carriers providing this 
service can charge the users a small service fee for ac- 
cess to it. 

25 [0089] Yet another advantage of the present Invention 
Is that a user can utilize voice recognition services and 
the local user interface (e.g. a phone keypad) concur- 
rently, thus providing the user with maximum flexibility. 
For example, the user can input a voice signal and in- 

30 termingle symbols from the local user interface. 

[0090] The many features and advantages of the 
present invention are apparent from the written descrip- 
tion, and thus, it is intended by the appended claims to 
cover all such features and advantages of the Invention. 

35 Further, since numerous modifications and changes will 
readily occur to those skilled in the art, it is not desired 
to limit the invention to the exact constmction and oper- 
ation as illustrated and described. Hence, all suitable 
modifications and equivalents may be resorted to as fall- 

40 ing within the scope of the invention. 



Claims 



45 1, 



A method of providing voice recognition services to 
a wireless communication device having a display 
screen and a user interface, the method comprls- 
ing:- 

receiving a request from the wireless commu- 
nication device for voice recognition services at 
a server device running a voice recognition ap- 
plication; 

retrieving a voice input signal associated with 
the request from a first communication path; 
converting the voice input signal into a symbolic 
data file using the voice recognition application; 
and 
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forwarding the syrr^ 
communication dev1 
nication path. 




lata file to the wireless 
sing a second commu- 



ic data file is an 




formatted data file. 



2. A method as recited in claim 1, wherein the first 
communication path is established on a wireless 
communication networlc. 

3. A method as recited in claim 1 or 2. wherein the 
symbolic data file is a markup language file. 

4. A method as recited in any preceding claim, where- 
in the second communication path includes a linl< 
server device connected to the server device run- 
ning the voice recognition application by a wired 
network using a first communication protocol and to 
the wireless communication device by a wireless 
network using a second communication protocol. 

5. A method of providing voice recognition services to 
a wireless communication device having a display 
screen and a user interface, the method compris- 
ing:- 

retrieving contact information for a server de- 
vice running a voice recognition application; 
generating a request for voice recognition serv- 
ices from the server device associated with the 
retrieved contact information; 
fOHA/arding the request for voice recognition 
services to the server device associated with 
the retrieved contact information; 
establishing a voice communication channel 
between the wireless communication device 
and the server device associated with the re- 
trieved contact information; 
receiving input from a user using the wireless 
communication device, at least a portion of the 
input including a voice component; and 
transmitting the user input to the server device 
for processing by the voice recognition applica- 
tion. 

6. A method as recited in claim 5. further comprising:- 

receiving a symbolic data file from the server 
device associated with the retrieved contact in- 
formation, the symbolic data file including the 
processed output of voice recognition process- 
ing of the user input by the server device; 
processing the received symbolic data file us- 
ing the local resources of the wireless commu- 
nication device; and 

displaying at least a portion of the processed 
symbolic data file to the user for review and 
modification. 

7. A method as recited in claim 6. wherein the symboi- 



8. A computer readable medium on which is encoded 
computer program code for generating a request for 
5 voice recognition services for a wireless communi- 
cation device, the medium comprising:- 

computer program code for retrieving contact 
information for a server device providing voice 

10 recognition services; 

computer program code for generating a re- 
quest for voice recognition services from the 
server device associated with the retrieved 
contact information; 

15 computer program code for receiving voice in- 

put from a user of the wireless communication 
device, the input being associated with the re- 
quest for voice recognition services; and 
computer program code for establishing a voice 

20 communication session between the wireless 

communication device and the server device 
for the purpose of transmitting a voice signal to 
the server device for voice recognition process- 
ing. 



25 



9. A computer readable medium on which is encoded 
computer program code for providing voice recog- 
nition services to a wireless communication device, 
the medium comprising:- 



30 



computer program code for processing a re- 
quest for voice recognition services received 
from the mobile device; 
computer program code for receiving a voice 
35 input associated with the request for voice rec- 

ognition services; 

computer program code for converting the re- 
ceived voice input into a symbolic data file; and 
computer program code for forwarding the 
40 symbolic data file to the mobile device originat- 

ing the request. 

10. A wireless communication system providing voice 
recognition services, the system comprising:- 

45 

a wireless communication device providing 
voice input for voice recognition processing on 
a first communication path and receiving a sym- 
bolic data file representing the processed voice 

50 input on a second communication path; and 

a server device running a voice recognition ap- 
plication receiving voice input from the wireless 
communication device on the first communica- 
tion path, converting the received voice input 

55 into a symbolic data file and forwarding the 

symbolic data file to the wireless device using 
the second communication path. 
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